Primary Analysis¶
Phenotype Modeling¶
Phenotype and covariate data from IMS v10, along with indicator variables reporting
genotyping platform batch and Other Asian raw ancestry calls from graf,
were processed and formatted into model matrix files. Continuous traits were
inverse normal transformed within ancestry group, stratified by sex, with random resolution of ties.
Categorical traits were processed into individual binary contrasts between a single reference
group (category 0, with the largest number of subjects); any non-reference group
with fewer than 10 subjects was combined into a single meta-group based on
the PLCO analysis plan document guidelines. All categorical covariates were similarly
processed, and turned into binary covariates to maintain compatibility with
analysis tools without direct support for qualitative covariates.
Primary Analysis with BOLT-LMM¶
For each platform/ancestry combination with at least 3000 subjects, chip subsets corresponding to these data were lifted from GRCh37 to GRCh38 with liftOver. Linear mixed model association with BOLT-LMM was run with the following parameters:
--bgenFile {filename}--sampleFile {filename}--lmm--LDscoresFile {filename}--statsFile {filename}--statsFileBgenSnps {filename}--phenoFile {filename}--phenoCol {column name}--covarFile {filename}--qCovarCol {covariate list}--LDscoresMatchBp--geneticMapFile {filename}
Primary Analysis with SAIGE¶
For each platform/ancestry combination with at least 1000 subjects and 30 cases for a given model matrix, chip subsets corresponding to these data were lifted from GRCh37 to GRCh38 with liftOver. Logistic mixed model association with SAIGE was run in two passes with the following functions and settings.
For round one:
SAIGE::fitNULLGLMMplinkFile {file prefix}phenoFile {filename}phenoCol {column name}sampleIDColinphenoFile {column name}covarColList {covariate list}nThreads 4traitType "binary"outputPrefix {file prefix}IsSparseKin TRUErelatednessCutoff 0.05
For round two:
SAIGE::SPAGMMATtestbgenFile {filename}bgenFileIndex {filename}vcfField DSchrom {chromosome}minMAF 0.01GMMATmodelFile {filename}sampleFile {filename}minMAC 1varianceRatioFile {filename}SAIGEOutputFile {filename}IsOutputAFinCaseCtrl TRUEsparseSigmaFile {filename}
Primary Analysis Postprocessing¶
After each analysis, the native result format was converted to the file format agreed upon with CBIIT. Allele frequencies from raw results were updated to approximate TOPMed reference frequencies estimated from test imputations of 1000 Genomes subjects from each supercontinent versus the TOPMed 5b reference panel.